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Abstract 

Modus Ponens says that if you know A and you know that A im- 
plies B then you know B. This is a basic rule that we take for granted 
and use repeatedly, but there is a gem of a theorem in logic by Gentzen 
to the effect that it is not needed in some logical systems. It is fun to 
say "You can make proofs without lemmas" to mathematicians and 
watch how they react, but our true intention here is to let go of logic 
as a reflection of reasoning and move towards combinatorial aspects. 
Proofs contain basic problems of algorithmic complexity within their 
framework, and there is strong geometric and dynamical flavor inside 
them. 

1 The beginning of the story 

Mathematicians are making proofs every day. In proof theory one studies 
proofs. This is frightening to many mathematicians, but a principal theme 
of the present exposition is to treat logic unemotionally. 

The idea of complexity sheds an interesting light on proofs. A basic 
question is whether a propositional tautology of size n should always have a 
short proof, a proof of size p[n) for some polynomial p for instance. There 
is a proof system in which this is true if and only if " AP = co — NP^^ . The 
latter is an unsolved general question in computational complexity which is 
related to the existence of polynomial-time algorithms in cases where only 
exponential algorithms are known. We shall say more about this later. The 
equivalence was established in [16]. 

Sometimes proofs have to be long if one does not permit a rule like Modus 
Ponens. Such a rule allows dynamics within the implicit computations oc- 
curring in proofs. In tracing through a proof one may visit the same formula 
repeatedly with many substitutions. The level in the hierarchy of proof 
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systems at which dynamics appears is the first at which httle is known con- 
cerning the existence of short proofs of hard tautologies. At lower levels it is 
known that there are tautologies with only exponential size proofs. Dynami- 
cal structure in proofs seems to be important for making proofs short. (This 
is discussed further in [14].) 

The idea of dynamical structure in proofs may seem odd at first, but it 
is very natural, both in ordinary mathematical activity and in formal logic. 
There is a way to eliminate the nontrivial dynamical structure from a proof, 
but at great cost in expansion. This is the matter of cut elimination that we 
shall discuss here. 

Before we get to that we should review some background information 
about logic and complexity. Our discussion will be informal, but one can 
find more precision and detail in [30, 36, 48]. We begin with some comments 
about propositional logic, predicate logic, and arithmetic. 

Propositional logic is the simplest of the three. One has variables, often 
called p, etc., and one can build formulas out of them using standard con- 
nectives, V (or), A (and), (negation). In predicate logic one uses the same 
connectives, and one also has the quantifiers V (for all) and 3 (there exists) for 
making formulas like 3xF{x). Here x is a variable and F is a relation, in this 
case a unary relation. Relations may depend on more variables. Constants 
and functions may also be used inside arguments of relations, and the func- 
tions are permitted to depend on an arbitrary number of variables. Thus one 
can make substitutions to create formulas like Vx 3j/ (^(^(a;, j/), ?/'(a(a;), c)), 
where G is a binary relation, ^, are functions of two variables, x, y are 
variables, and c is a constant symbol. Expressions like cf>{x, j/), ?/'(a(a;), c) are 
called terms, and in general terms may be constructed from constants and 
variables using any combination of functions. For arithmetic one permits ad- 
ditional symbols to represent numbers and basic arithmetic operations, and 
one adds axioms about the arithmetic objects. For example, in arithmetic 
X -\- y is a, term, x -\- y = z is a, formula, and there is an axiom to the effect 
that X -\- = X. 

The provable statements in arithmetic are called theorems. The same 
term is used in any context in which there are special axioms describing a 
mathematical structure, like arithmetic. Statements which are true indepen- 
dently of particular mathematical structure, as in ordinary propositional or 
predicate logic, are called tautologies. 

Propositional and predicate logic are sound and complete. Roughly speak- 
ing this means that provability is equivalent to being true in all interpreta- 
tions. In contexts with extra mathematical structure (like arithmetic) one 
should restrict oneself to interpretations which are compatible with the given 
structure. 
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These three systems are very different in terms of the ieveis of complexity 
that arise naturally in them. For this discussion we need to have some notions 
from complexity theory, but rather than descend into details let us just say 
a few words. The reader probably has already a reasonable idea of what is 
an algorithm. To make it more formal one has to be precise about what 
are the acceptable inputs and outputs. A typical output might simply be 
an answer of "YES" or "NO". The input should be encoded as a string of 
symbols, a "word" in the language generated by some alphabet. For instance, 
formulas in logic can be encoded in this way. A basic method to measure the 
complexity of an algorithm is to ask how long it takes the algorithm to give 
an answer when the input has length n (= the number of symbols). Is the 
amount of time bounded by a polynomial in n, an exponential in n, some 
tower of exponentials, etc. 

In propositional logic the problems that typically arise are resolvable by 
obvious algorithms in exponential time, and the difficult questions concern 
the existence of polynomial time procedures. For instance, given a proposi- 
tional formula A, let us ask whether it is not a tautology. This question can 
be resolved in exponential time, by checking truth tables. This is actually an 
"NP" problem, which amounts to the fact that if A is not a tautology, then 
there is a fast reason for it, namely a set of truth values for the propositional 
variables inside A for which the result is "False" . The difficulty is that this 
choice of truth values depends on A and a priori one has to search through 
a tree to find it. This problem turns out to be NP-complete, which means 
that if one can find a polynomial-time algorithm for resolving it, then one 
can do the same for all other NP problems, such as the travelling salesman 
problem, or determining whether a graph has a Hamiltonian cycle. 

By contrast in predicate logic one typically faces issues of algorithmic 
decidability or undecidability. That is, whether there is an algorithm at 
all that always gives the right answer, never mind how long it takes. The 
problem of determining whether a formula in predicate logic is a tautology is 
algorithmically undecidable. One can think of this as a matter of complexity, 
as follows. The tautologies in predicate logic of length at most n is a finite 
set for which one can choose a finite set of proofs. Let f{n) denote the 
maximum of the lengths of the shortest proofs of tautologies of length < n. 
The fact that there is no algorithm for determining whether or not a formula 
is a tautology means that f{n) grows very fast, faster than any recursive 
function. 

In order to find a proof of a tautology one has to permit a huge expansion. 
This is reminiscent of the word problem for finitely presented groups. Again 
there is no algorithm to determine whether a given word is trivial or not. 
One can think of this in terms of the huge (nonrecursive) expansion of a 
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word which may be needed to estabhsh its triviahty. 

This is a nice feature of proof theory and complexity. If one looks at 
the standard examples of complexity problems that deal with exponential 
versus polynomial time, they are usually very different from the standard 
examples of problems that are algorithmically decidable or undecidable. In 
proofs they can fit under the same roof in a nice way. "Herbrand's theorem" 
- discussed in Section 7 below - permits one to code predicate logic in terms 
of propositional logic, but with large expansion. 

What about arithmetic? In arithmetic infinite processes can occur, com- 
ing from mathematical induction. One can still study the level of "infinite 
complexity" though. (This terminology may seem strange, but indeed the 
idea of measuring infinite complexity is present in many areas of mathemat- 
ics, even if it is not always expressed as such. A lot of classical analysis can 
be seen in this way. One cannot describe a general function with a finite 
number of parameters, but one can do this approximately in the presence 
of some smoothness conditions, with the degree of approximation related to 
the degree of smoothness. With little or no smoothness even the approxi- 
mate behavior must be counted in more infinite ways. The differentiability 
almost everywhere of Lipschitz functions provides a good example of this 
phenomenon. One encounters the necessity of infinite processes in topology 
as well.) 

There are dynamics inside proofs, and these dynamics seem to be closely 
related to the matters of complexity just described, in all three cases, of 
propositional logic and predicate logic and arithmetic. Our next task is to 
describe a precise logical system in which we can work, and then explain the 
cut-elimination theorem, which provides an effective procedure for eliminat- 
ing dynamics from proofs. We shall discuss some of the combinatorial aspects 
of cut-elimination, and some of its consequences. We shall give examples of 
proofs which use cuts in an interesting way, starting with the John-Nirenberg 
theorem in real analysis. The proof constructs an expanding tree of intervals 
in the real line by iterating a process that is coded in a single lemma. The 
proof has an interesting dynamical structure, in which an interval that is 
produced by the lemma is then fed back into the lemma to make more inter- 
vals, and so forth. This is an example of the kind of substitution that one 
expects in complicated proofs in predicate logic. We give a more elementary 
example with similar dynamical structure in Section 9. This example comes 
from [12] and is easier to formalize precisely. In Section 10 we explain the 
notion of the logical flow graph which provides a tool for seeing dynamical 
structure within proofs more clearly. 

We would like to thank M. Baaz, M. Gromov, R. Kaye and C. Tomei for 
their comments and suggestions. 
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2 Sequent calculus 



In our discussion of logic so far we have talked about the language - what is a 
formula? - but not about proofs. The choice of proof system is a nontrivial 
matter. We shall work with sequent calculus and the logical system LK. 
This turns out to have nice combinatorial properties, as we shall see. 

The concept of a sequent can be a little confusing at first. A sequent is 
something of the form 

Ai, A2, . . . , A.m -61,-62, • • • , 

where the A^'s and the -Bj's are formulas. The interpretation of this sequent 
is "from Ai and A2 and ... and A^ follows Bi or i?2 or ... or However, 
the arrow — )• is not a connective, nor is the sequent a formula. 

Because we are using — )• for sequents in this manner, we use the symbol 
D for the connective that represents implication. If A and B are formulas, 
then A D B is also a formula, which is interpreted as "A implies i?" . 

So then what is the point of sequents? Why do we not simply use the 
formula 

Ai A A2 A . . . A A„ D 5i V ^2 V . . . V 5„? 

The two have the same interpretation, but they are different combinatorially. 
Even in ordinary reasoning we would not normally take all of our information 
Ai, Am and package it into the single unit Ai A . . . A A^- The commas 
permit us to treat the formulas A^, Bj as individuals which can each be used 
separately. We shall see that this fiexibility is important in the system LK ^ 
for which we have certain monotonicity properties in the way that formulas 
are constructed. 

We should say that in a sequent as above, the formulas Ai and Bj are 
permitted to have repetitions, and this turns out to be important. We do 
not care about the ordering, however. We might say "multisets" of formulas 
to make clear that we mean unordered collections in which repetitions are 
counted. We also permit empty sets of formulas, e.g., 

Ai, A2, . . . , A„ ^ and Bi, B2, . . . , B^ 

are permissible sequents. As a matter of notation we shall typically use 
upper-case roman letters A, B^C... to denote formulas, and upper-case greek 
letters like F, A, A... to denote collections of formulas. 

For a nice example of a sequent, consider F — )• A, where F is the collection 
of formulas 

n 

r = {\J Pi,3 : « = 1, . . . ,n -M} 
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and A is the collection of formulas 



^ ~ {Phi ^ Pni,] :l<«<m<n + l,l<j<n}. 



The Pij's here are propositional variables. The sequent F — )• A represents a 
finite version of the pigeon-hole principle: if you have n + 1 balls, and each ball 
is placed in one of n boxes, then there is at least one box which contains two 
balls. It is a valid sequent whose proof is somewhat tricky in propositional 
logic. Normally we think of it as a part of more powerful languages for which 
the proof is immediate, but in propositional logic it is more subtle, as we 
shall see. 

Here now is the system LK. It should be interpreted as follows. Our 
basic objects are sequents, in the sense that we prove a sequent, we do not 
prove a formula. If we want to think of proving a formula A, then we should 
prove the sequent — )• A. To prove a sequent we begin with axioms and derive 
new sequents from them using certain rules. For LK the axioms are sequents 
of the form 



where A is any formula and F, A are any collections of formulas. The rules 
come in two groups, logical rules and structural rules. In these rules we write 
F, Fi, F2, etc., for collections of formulas, and we write Fi^2 as a shorthand for 
the combination of Fi and F2. Remember always that we allow repetitions, 
so that one must count multiplicities when combining collections of formulas, 
and that we do not care about the ordering of the formulas on either side of 
the sequent. 

The logical rules are used to introduce connectives, and they are given as 
follows: 



A,F ^ A,A 



- : left 



F ^ A,A 
-A,F ^ A 



: right 



A,F ^ A 
F ^ A,-A 



Ti 



Ai,A F2 



A2,5 



A : right 



V 



,2 — ^ ^1,2, 



A : left 



A,^,F ^ A 
A A5,F ^ A 



V : left 



A,Fi^Ai ^,F2^A2 
A V5,Fi,2 ^ Ai,2 



V : right 



F ^ A,A,^ 
F ^ A,AV5 
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A,r ^ A,5 



D: right T ^ A, A D B 

A(6),r^A r^A,A(t) 



3 : left {3x)A{x), T ^ A 3 : right T ^ A, (3x)A(x) 

A(t),r^A r^A,A(6) 



V : /e/t (Vx)A(x), r ^ A V : right T ^ A, (Vx)A(x) 

The structural rules do not involve connectives and are the following: 

ri^Ai,A A,r2^A2 

Cut ri,2 ^ Ai,2 

r ^ A,A,A A,A,r ^ A 

Contraction F — )• A, A A, F — > A 

This is the system LK for ordinary predicate logic. For propositional 
logic one drops the rules with quantifiers, and formulas are merely boolean 
combinations of propositional variables (with no functions or substitutions). 

One should be a little careful about the rules for the quantifiers. In 
3 : right and V : /e/t, any term t is allowed which does not include a variable 
which lies already within the scope of a quantifier in the given formula A. In 
3 : left and V : right one has the "eigenvariable" b which should not occur 
free in F, A. 

This presentation of logic is quite different from what one ordinarily sees 
in an undergraduate course. It is easy to check however that the axioms and 
rules make sense in terms of usual reasoning. 

The rule that probably looks the strangest is the cut rule. One can think 
of it as an elaboration on Modus Ponens, and again one can check that it is 
compatible with standard reasoning. 

Although LK does represent classical logic, it does so with some un- 
usual subtleties, some properties of monotonicity and conservation. Formu- 
las never get simpler. The logical rules permit us to make new formulas by 
combining old ones, and to introduce connectives, but they do not permit 
us to simplify formulas. Formulas never disappear, except in cuts. The only 
other simplification allowed is contraction, in which a repetition is reduced. 
Formulas do not appear suddenly. Everything has to be constructed from 
the formulas in the axioms. 



7 



An important consequence of this is that the size of a proof is controlled 
by the axioms, unless one stupidly applies the negation rules over and over 
again for no reason. In a large proof one should typically expect a large 
number of axioms. If the proof is large but the number of axioms is much 
smaller, and one does not repeat the negation rules in a foolish manner, 
then it means that there are a lot of "weak formulas", i.e., the formulas 
coming from the F's and A's in the axioms A, F — )• A, A. If this happens 
then either the proof relied on foolish contortions involving weak formulas, 
so that it could be shortened, or the proof has a small number of steps and 
the endsequent is full of formulas coming from weak formulas, which is not 
too interesting. 

Sometimes a crucial ingredient for the size of a proof looks dumb in terms 
of reasoning. The contraction rule, for instance, looks pretty silly, but we 
shall see in the next sections that it matters much for complexity. (In Linear 
logic [24], an extension of classical logic, contractions are controlled in an 
explicit way through exponential operators.) The problem of whether or not 
a sequent has a short proof can be seen as a purely combinatorial issue, apart 
from reasoning. 

3 Cut elimination 

Theorem 1 (Gentzen [22, 25, 57]) Any proof in LK can he effectively trans- 
formed into a proof which never uses the cut rule. This works for both propo- 
sitional and predicate logic. 

There is a version of this for arithmetic but one has to allow infinite 
proofs, because of induction. One can analyze the level of infiniteness used, 
measured by countable ordinals, as in [51]. 

This is a gorgeous theorem. It says first that we can make proofs with- 
out Modus Ponens, which is a bit striking. It also has nice consequences, 
including the subformula property: given a proof of a sequent which does not 
use the cut rule, then every formula which appears in the proof also appears 
as a subformula of a formula in the final sequent. This follows from the 
monotonicity properties of LK . 

There is a nice image associated to proofs in LK and the way that for- 
mulas are constructed. One can code the combinatorics of the construction 
of a formula with a tree. Combining two formulas A and B into A A i?, for 
instance, corresponds to combining the trees associated to A and B. As for- 
mulas progress through LK they are constructed in this manner. The trees 
never disappear when there are no cuts. 
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We shall discuss the proof of the cut-elimination theorem in a moment, 
but let us begin with the following question: what is the price for eliminating 
cuts? Just think about the difficulty in ordinary mathematical activity of 
making proofs which satisfy the subformula property. 

The price comes in the expansion. The elimination of cuts leads to a 
simplification in the dynamical structure of the proof at the cost of large 
expansion in the size. There are propositional tautologies for which cut-free 
proofs must be exponentially larger than proofs with cuts, and in predicate 
logic the expansion can be non-elementary. See [44, 45, 53, 54, 55, 58]. 

A nicely concrete example of this expansion is provided by the pigeon-hole 
principle. We saw in the preceding section how the pigeon-hole principle can 
be formalized as a sequence of propositional sequents, one for each positive 
integer n. It turns out that the pigeon-hole principle can be proved in LK 
with a proof of polynomial size in n if one allows cuts [7], while exponential 
size is required for proofs without cuts [31]. See [1, 2, 49, 4, 3] for related 
work. Analogous results hold for the propositional version of the Ramsey 
theorem (see [47]). 

The idea of lengths of proofs is quite amusing from the perspective of 
reasoning. It suggests that there are some statements that are true that we 
cannot understand in practice because it would take too long, and that there 
are statements which we can understand if we permit ourselves to use cuts 
and not otherwise. Induction plays a similar role in arithmetic, and indeed 
the strength of induction required in arithmetic for various purposes has been 
much studied. (See [30].) 

What is it about the cut rule that permits this compression of proofs? One 
can think that the size of the proof is being compressed even if the dynamical 
content remains the same in essence. The use of lemmas permits one to 
make repeated substitutions with the same coding. We shall see this more 
concretely in Sections 8 and 9. There are graphs associated to proofs which 
trace the fiow of logical occurrences, and these graphs are approximately trees 
for cut-free proofs, but proofs with cuts can have cycles. We shall discuss 
this further in Section 10. 

As in the introduction, the existence of short proofs for all propositional 
tautologies is equivalent to NP = co — NP. It is not clear exactly what 
should create an impassable obstruction to compression. People have looked 
at combinatorial principles stated as tautologies in the hope of showing that 
the proofs had to be long even if cuts were allowed, but no one has succeeded 
in doing this so far. We already saw that the pigeon-hole principle does have 
short proofs (polynomial size). Similarly there are finite versions of Ramsey 
theorems which have been coded as statements in propositional logic and for 
which there are short proofs. Unlike the pigeon-hole principle these short 
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proofs have not been given explicitly though. See [47]. 

In considering issues of complexity one should not be distracted too much 
by the interpretation in terms of reasoning. It seems more natural to view 
these as combinatorial problems which are merely expressed in the language 
of logic. Indeed they seem to have a natural geometry to them, discussed in 
Section 10. 

How can one try to prove the cut-elimination theorem? The first point 
to understand is that it is not simply a matter of expressing the cut rule 
in terms of the others. When one studies logic as an undergraduate one 
is frequently told that various languages are equivalent, or various proof 
systems are equivalent, by dint of rules that permit translations from one to 
the other. That is not what is happening here. One has to operate on an 
actual proof, and the argument is more global in nature. 

Roughly speaking the argument works by systematically going through 
a proof and simplifying the cuts. One starts at the bottom, near the final 
sequent, and works upwards, trying to push the cuts up until one encounters 
axioms or other simple situations. For instance, if we get to a cut of the form 

A B 
A^B 

we can eliminate it stupidly by throwing away the axiom. In general though 
we have to take into account the specific structure of the situation. Consider 
the following situation. 

A^ B A^C 
A,A^ BAC 
T ^ A A^B AC 
V ^ B AC 

We used here first the rule for introducing a conjunction on the right, then a 
contraction, then a cut. Think of this as appearing in the middle of a proof, 
so that we have a proof Hq of F — )• A, a proof Hi of A — )• B, and a proof 
112 of A — )• C, and we are combining these three proofs to get a proof of 
F — )• B A C . We have used here a cut and we want to get rid of it. To do 
this we work as follows. 

V ^ A A^ B V ^ A A^C 

T ^ B F ^ C 

F,F ^ ^ A C 
T ^B AC 

Here we started with a pair of cuts, followed by the conjunction rule and then 
contractions. This gives us another way to combine the proofs Hq, Hi, and 
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112 into a proof of F — )• B AC . From the point of view of cut elimination this 
arrangement is better than the previous one. That might seem strange, since 
we now have two cuts instead of one, but the two new cuts are at simpler 
stages in the proof than the original one. We have moved the cut up higher 
in the proof, across the contraction (from A, A to A). In doing this we have 
introduced new contractions (to get from F,F to F), but more importantly 
we have had to use the proof Hq of F — )• A twice. This is the principal source 
of expansion in the process of cut elimination, the duplication of subproofs 
that occurs when one pushes the cut up over a contraction. 

This example corresponds to the idea of a lemma in ordinary mathemat- 
ics. In the original piece of proof we knew that A would imply each of B and 
C, we had a derivation of A from F, and we wanted to obtain B AC . We did 
not want to have to derive A from F twice, we wanted one "lemma" to say 
that it was true. By using the contraction and cut we were able to do this, 
but by eliminating the cut we had to repeat the proof for each application. 

The use of the cut enabled us to make a shorter proof, but to gain this 
efficiency we had to merge two occurrences of A even though they could be 
used in completely different ways. To understand this point it is helpful to 
think of arithmetic. Think of A as saying that some property of numbers 
is preserved when one multiplies two of them together. This fact might be 
employed several times in the proof, applied to different numbers or terms, 
even though proved only once. It may be that one lemma is applied to terms 
obtained from previous applications of itself. This type of phenomenon occurs 
in the examples discussed in Sections 8 and 9. 

Here is another example, but in predicate logic this time. 

A^F{t) B ^ F{s) 
A 3xF{x) B 3xF{x) 
Ay B ^ 3xF{x),3xF{x) F{a) A 

Ay B ^ 3xF{x) 3xF{x) A 

Ay B ^ A 

Again think of this as being part of a larger proof, in which we have already 
proofs of A — )• F[t)^ B — > F[s)^ and F[a) — > A, where the eigenvariahle a 
does not occur free in A. Here s and t are terms, which means that they can 
be constructed from constants and variables using function symbols. This 
provides a nice example of the way that the cut rule can be used; it permits 
us to make the lemma that 3xF{x) — )• A rather than deriving A from F{x) 
for each possible choice of x. To push the cut upward we need to make 
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separate proofs for s and t. We do this as follows. 



A F{s) F{s) A B ^ F{t) F{t) A 

A ^ A ~ ^ ^ A 

A ^ A,A 
A V5 ^ A 



We get the proofs of F{s) — )• A and Fit) — )• A by taking the proof of F{a) — > 
A in the original and substituting the terms s and t for the eigenvariable a. 

This ability to make substitutions is a fundamental difference between 
predicate and propositional logic. Substitutions interact with the cut rule 
in a very substantial way. In passing from F{s) and Fit) to 3xF{x) we are 
doing something very strong, since the terms s and t need not have anything 
to do with each other. 

These two examples illustrate the ideas and effects of pushing a cut up 
across a contraction. In fact there is a general procedure which works in all 
cases. If we start with 



In doing this we have to duplicate the proof 112. This can lead to large 
expansion in the size of the proof if we have to do it many times. 

This explains how one can push a cut up across any contraction. Consider 
the problem of pushing a cut up over a conjunction, as in 



Hi 

Ti ^ Ai,A,A n2 

ri^Ai,A A,r2^A2 
ri,r2 ^ Ai,A2 



where A is an arbitrary formula, then we can replace it with 




ri,r2 ^ Ai,A2 



Ti ^ AiA r2 ^ A2^ A,^,r ^ A 
ri,r2 ^ Ai,A2,A AA^,r^A 

r,ri,r2 ^ A,Ai,A2 



We can replace this with 



A2,5 



ri^Ai,A A,^,r^A 
5,r,ri ^ A,Ai 



r,ri,r2 



2 ^ A,Ai,A2 
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Now we are using two cuts, but they have lower complexity. 
For this replacement we could just as well have used 

ri^Ai,A A,r,r2^A,A2 
r,ri,r2 ^ A,Ai,A2 

This reflects an important feature of cut-elimination: there is no canonical 
way to do it. We had a choice here about how to arrange the cuts. Similarly 
for contractions, if both appearances of the cut formula A were obtained 
from contractions, then we would have a choice as to which subproof to 
duplicate flrst. In principle we can have procedures of cut-elimination which 
go on forever. Of course the point of the theorem is that one can always 
flnd a way to eliminate cuts in a flnite number of steps. One can even make 
deterministic procedures by imposing conditions on the manner in which the 
transformations are carried out. See [24]. 

The cases that we have considered illustrate well the general scheme of 
the proof of the cut-elimination theorem. The principle is to push the cuts up 
higher in the proof, but we have to be careful about the notion of "progress" , 
because we typically increase the number of cuts at each stage of the process. 
In the examples about contractions we made progress in the sense that we 
reduced the number of contractions above the cut formula, even though we 
may increase the total number of contractions by adding them below the 
cut. In the example with conjunctions we reduced the complexity of the cut 
formula. It is not hard to make examples to exhaust the possibilities, but 
a complete proof requires a tedious veriflcation of cases that we shall not 
provide. (See [25, 57].) 

4 Mathematics and formal proofs 

Mathematical logic provides a way to formalize ordinary mathematical ac- 
tivity, and cut-elimination has a very interesting role in this. 

For this story we should back up to the Hilhert program, which sought to 
show that mathematics could be formalized in a pure way and that in princi- 
ple abstraction could be avoided. (See [6].) One can try to treat mathematics 
in a completely symbolic manner, where mathematical formulas are strings 
of symbols constructed through flxed formal rules. The rules of logical infer- 
ence are formal as well and permit the passage from one string of symbols to 
another. In this way proofs can be treated as mathematical objects in their 
own right and studied mathematically. 
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In mathematics we have a natural informal hierarchy of abstraction. We 
can start with the very concrete, like natural numbers and finite graphs, and 
proceed to the more transcendental, real and complex numbers, real and 
complex analysis. We can proceed further from "simple" infinite processes 
to infinite-dimensional ones, like Hilbert spaces and operators on them, then 
further still to highly abstract and nonconstructive concepts, often associ- 
ated to the axiom of choice, such as ultrafilters. Many aspects of analysis, 
for instance, are touched by the phenomenon of nonconstructive existence 
through compactness or the Hahn-Banach theorem. 

In some cases these abstractions are "approximately concrete", as with 
infinite processes which admit well-behaved finite approximations. Some- 
times remote abstractions wander into more concrete worlds. Compactness 
or the Hahn-Banach theorem may be used to "provide" theoretical solutions 
to explicit differential equations, or compactness can lead to uniform bounds 
without providing a clue as to how to generate an actual number. Even more 
elementary questions about integers are sometimes treated through transcen- 
dental methods or nonconstructive abstractions of compactness. 

These phenomena are naturally troubling. The literature is full of debates 
and attempts to address the problem. Hilbert 's program, in its strongest 
form, would have provided a very attractive resolution of these difficulties. 
It sought to show, roughly speaking, that concrete statements that are true 
should always have finite proofs. That the infinite methods of general math- 
ematical activity would not lead us to trouble, that the infinite abstractions 
were a convenience that could, in principle, be replaced with more direct 
elementary arguments for elementary assertions. 

In its strongest form Hilbert 's program failed and failed utterly, because 
of the celebrated work of Godel. 

In some ways Godel's work has had unfortunate negative side effects. 
Godel's results have enormous conceptual consequences, and they force one 
to confront and accept some troubling phenomena, but some of the ideas 
behind Hilbert 's program retain their strength despite being overshadowed 
by the failure of other aspects. 

Gentzen's theorem arose in this context. One can view it as providing 
a positive result in contrast with Godel's work. Cut elimination gives an 
approach to converting indirect proofs into direct ones. Gentzen's work also 
helps to clarify the precise meaning of an "elementary" proof, which Hilbert 
had left vague and intuitive. 

Cut elimination need not work in an arbitrary logical system, or it may 
work with qualifications. In arithmetic it may convert a finite proof into an 
infinite one. Still, one can often control the level of infinite processes used 
(transfinite induction up to a certain ordinal) and thereby obtain "quasi- 
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elementary" prools ol elementary statements. One of Gentzen's purposes 
was to bring the consistency of arithmetic closer to elementary arguments. 

Keep in mind that cut elimination does transform finite proofs into finite 
ones in ordinary predicate logic. Arithmetic is simply much more compli- 
cated, as we see in Godel's results and the necessity of passing to infinite 
proofs when eliminating cuts. Remember though that the set of tautologies 
in ordinary predicate logic is algorithmically undecidable, and that cut elim- 
ination leads to large expansion of proofs. Thus ordinary predicate logic is 
finite in ways that arithmetic is not, but the expansion is still there, in a 
weaker form. 

This view of cut elimination and Hilbert's program is illustrated well by 
some work of Girard [25]. The story begins with Hilbert's program in reverse: 
Furstenberg and Weiss [20] found a very elegant proof through transcenden- 
tal methods of dynamical systems of the van der Waerden theorem [59] on 
arithmetic progressions. In this case the elementary proof came first, the 
short nonelementary proof arrived much later. Girard showed however that 
one could recover the elementary combinatorial arguments by applying the 
procedure of cut elimination to the methods of Furstenberg and Weiss. 

Other examples of analysis of mathematical proofs through cut elimi- 
nation can be found in [5, 32, 33, 34] and in unpublished work of Kreisel 
concerning a theorem of Littlewood in number theory. For this type of anal- 
ysis a basic tool is the no counterexample interpretation of Kreisel for turning 
infinite arguments into finite ones through the introduction of functionals. 

This basic idea, of looking for consequences of Gentzen's work for ordinary 
mathematics, occurs repeatedly in the writings of Kreisel [37, 39, 38], and 
he raises many concrete questions. Progress has been made, but the issue 
remains to be addressed in a strong way. 

5 Remarks about the subformula property 

A proof satisfies the subformula property if every formula that appears in the 
proof also arises as a subformula of a formula in the endsequent. We saw in 
Section 3 that any provable sequent has a proof which enjoys the subformula 
property, because any cut-free proof has this property. 

We should be a little bit careful here. What exactly is a subformula? 
Basically A is a subformula of B if you can get B from A by adding things to 
it. For propositional logic this is straightforward, but quantifiers in predicate 
logic bring a subtlety with them. We can start with a formula where 
F is a unary relation and t is a term, and we can build from it the formula 
3xR[x). We consider R{t) to be a subformula of 3xR[x). If s is another 
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term then R{s) is also considered to be a sublormula ol 3xR[x). 

This is an enormous difference between propositional and predicate logic. 
In propositional logic if we know a formula then we know its ancestors ex- 
actly. If it were true in predicate logic that the ancestors of a formula are 
determined by the formula itself, then there would be an algorithm for decid- 
ing which sentences are tautologies, which is not true. This would amount 
to saying that any tautology would have a cut-free proof which could then 
be controlled. 

The size of the formulas which appear in a cut-free proof in propositional 
logic is bounded by the size of formulas in the endsequent. In predicate logic 
this does not work because the terms that appear inside the proof can be 
larger than the ones in the endsequent. In both propositional and predicate 
logic the size of a cut-free proof - the number of symbols in the proof - can be 
controlled in terms of the size of the endsequent together with the number of 
lines in the proof. (See [35].) In both cases the number of lines in a cut-free 
proof can be very large compared to the size of the endsequent because of 
the presence of contractions. 

We should emphasize the relation between the subformula property and 
the idea of "lemmas" in a proof. Gentzen's cut-elimination theorem is often 
described as a procedure for making direct proofs, for avoiding intermediate 
results which are more general than the final theorem. The subformula prop- 
erty is a manifestation of this idea, that nothing occurs in the proof which 
is more general than the final result. To understand this in a more concrete 
way think about proving a formula A[t) that has no quantifiers but does 
have a term t in it. In establishing this tautology we might use formulas of 
the form R{si) many times, for various terms Si. One can imagine that a 
shorter proof might be possible using \/xR{x). This would not be allowed in 
a cut-free proof, because of the subformula property, and because Ait) has 
no quantifiers. In a cut-free proof all occurrences of R{si) for the relevant 
terms Si would have to be listed separately. 

Some of these phenomena occur in the examples discussed in Sections 8 
and 9 below. 

6 The Craig interpolation theorem 

Roughly speaking the Craig interpolation theorem [17] states that if one can 
prove the sequent 

A^ B 
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in LK then one can also prove the sequents 

A^C and C ^ B 

where the formula C - called the interpolant of A and B - involves only 
the language common to A and B. Common language means the common 
propositional variables in the case of propositional logic, and the relations, 
functions, and constants for the predicate case. 

In terms of reasoning this is not at all surprising. If A involves apples 
and oranges, B involves apples and bananas, and A implies i?, then A ought 
to imply a statement that involves only apples, and B ought to follow from 
a statement that involves only apples. The oranges should not help, and the 
bananas should not hurt. 

So what is the mystery then? The Craig theorem is trickier to prove 
than one might think. One has to have the same statement about apples 
for both A and B\ To construct C and proofs of A — )• C and C ^ B the 
cut-elimination theorem is extremely useful. Once one has a cut-free proof, 
the construction of C is a fairly simple combinatorial problem (see [40] and 
also [25, 57]). 

In fact one can formulate Craig interpolation in purely combinatorial 
terms [10], in such a way that the special nature of formulas does not really 
matter. The interpolation theorem can be seen as a general statement with 
few structural requirements that would apply as well to graphs or polygons 
or tessellated surfaces as to formulas in logic. 

It is important here that we have cut elimination to start with. It would 
be much more difficult to give a general combinatorial formulation of Craig 
interpolation for proofs with cuts. With cut-elimination as a starting point 
one need only understand how to combine basic objects in a natural way. 
Formulas are the basic objects in logic, but to have a more combinatorial 
image it is nicer to think of them simply as trees. Think of coding the 
construction of a formula by a tree, and then forgetting the logic and just 
keeping the tree. It is easy to combine two trees, by connecting them at the 
root, and after cut elimination this is essentially the kind of operation that 
Craig interpolation requires. If there are cuts in the proof then the matter 
is entirely different. One would then have to go inside the structure of the 
objects, rather than simply combining them. 

This brings us to an interesting point about cut-elimination, as an indi- 
cation of simplicity or "finiteness" of the combinatorics of proofs. It is better 
to have cut elimination with large expansion than to not have it at all. In 
what other contexts in mathematics is there something analogous? Some 
"normalization" which always exists (in a nontrivial way)? One is tempted 
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to draw an analogy with singularities of algebraic varieties. Resolution of 
singularities is a difficult matter, but it reflects an underlying finiteness that 
is not typically present in analysis or topology, for instance. 

Cut elimination permits us to derive some general results through effective 
constructions, like Craig Interpolation and Herbrand's theorem (discussed 
below). However, the use of cut elimination does not provide interesting 
bounds. One constructs the interpolant by induction, at each step combining 
old interpolating formulas to get a new one. Contractions in the proof do 
not lead to contractions of interpolating formulas. In fact the interpolant 
reflects the entire construction in the proof. For this reason its size may be 
very large compared to the size of the endsequent, but linear in the size of 
the cut-free proof. 

It is not apparent that interpolation should necessarily be as complicated 
as cut elimination. 

What kind of bounds on the size of the interpolant C can one obtain in 
terms of the sizes of A and B? If the interpolant C must be large compared 
to A and i?, what would that imply about the structure of A ^ Bl 

In first-order predicate logic (with equality) one can have nonrecursive 
expansion for the size of the interpolant over the size of the endsequent, as 
in [19, 41]. 

What about propositional logic? Is it true that the size of C is always 
bounded by a polynomial in the sizes of A and Bl If so, then one would have 
a general result in complexity theory, namely NPO co — NP C P/poly. This 
problem is a weaker cousin of P = NP^ but it is equally unknown. The main 
point is that a language which is in NP D co — NP can be coded in terms 
of a family of sequents An — > Bn of controlled size, n = 1,2,3, .. ., and a 
family of interpolants C„ involving only the common variables of An and Bn 
would lead to a family of circuits which characterize the original language. 
See [42, 43, 11] for more details. 

If one does not believe in polynomial bounds for general classes of prob- 
lems which appear to require exponential time, then one would expect the 
failure of polynomial bounds for Craig interpolation. 

This brings us back to our earlier questions about what is a difficult proof, 
what are the mechanisms by which proofs with cuts can be much shorter than 
cut-free proofs, etc. These questions fit well with the problem of knowing 
what structure in a sequent A ^ B is needed if the interpolant C is always 
large. See [11] for some results in this direction. 

What kinds of bounds can one get for the sizes of the proofs of A — )• C and 
C — > i? in terms of the proof of 11 : A — )• B? In the propositional case, for 
instance, can one construct C and proofs of A — )• C and C ^ B whose sizes 
are bounded by a polynomial in the size of 11? Is there a polynomial-time 
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algorithm lor finding these prools? These are all questions that are open. 
See [11] lor lurther discussion ol these matters. 

There is an interesting variant ol these questions, in which one starts 
with a truth assignment a lor the variables common to A and B and looks 
lor either a prool ol A'^ — )• or — )• B'^ ol controlled size. Roughly speaking 
A'^ and B'^ are obtained Irom A and B using the truth assignment a. See 
[11] lor details. The idea is that il we have an interpolating formula C, 
then cr converts C simply into "true" or "lalse", and prools ol A — )• C and 
C ^ B would then give rise to either a prool ol A'^ — )• or ol — )• B'^ according 
to whether we obtained "lalse" or "true" Irom C, respectively. II lor each 
truth assignment a we can decide which ol A'^ — )• or — )• B'^ is provable in 
polynomial time, then that is essentially the same as finding an interpolant 
C ol polynomial size. In both cases we get a lunction which can be computed 
in polynomial time even il the descriptions are not the same. 

7 Herbrand's theorem 

Let us begin with a simple version ol Herbrand's theorem. 

Theorem 2 If A[x) is a formula without quantifiers in which the variable x 
appears free, then — )• 3xA[x) is provable in LK if and only if there is a finite 
collection of terms ti, . . . ,tn such that — )• . . . , Aitn) is provable using 

only propositional rules (i.e., without quantifier rules). 

We should emphasize that A[x) is allowed to be a formula^ and not simply 
a relation, so that A[x) might have the form F[x) A^G{x^ t/j[x))^ lor instance. 

Roughly speaking, the theorem says that il you can prove 3xA[x)^ then 
you can make the existence explicit by producing a finite collection ol terms, 
at least one ol which satisfies the property A(-). 01 course the converse is 
true because ol the 3 : right and contraction rules. 

The theorem is very easy to obtain once we have cut elimination. II 
— )• 3xA[x) is provable, then it is provable without cuts. Thus we may assume 
that we have a cut-lree prool 11 ol — )• 3xA[x) to begin with. In particular 11 
enjoys the sublormula property. This means that any formula that occurs in 
n and which has a quantifier in it must be an occurrence ol 3xA[x). 

Let us start Irom the bottom ol 11, at the endsequent — )• 3xA[x)^ and 
think about what happens as we go up in the prool. We may pass through 
contraction rules, which would cause 3xA[x) to be duplicated as we go up. 
The first moment at which 3xA[x) is derived Irom something other than a 
contraction rule it must be obtained using 3 : right, starting Irom a formula 
ol the form Ait) lor some term t. 
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It is easy to use these observations to reorganize 11 to get a proof of 
— > . . . , A[tn) as desired, where ti, . . . , t„ are the terms that arise when 

we peel off the existential quantifiers. More formally, one can first remove 
the contractions "at the bottom" and make some minor reorganizations as 
needed to get a proof of — )• 3xA[x)^ . . . , 3xA[x)^ where none of the occur- 
rences of 3xA[x) was obtained using a contraction, and then one peels off 
the quantifiers to get — )• . . . , Notice that only propositional 

rules could have been used to obtain the A(ti)'s, because of the subformula 
property (since A[x) is quantifier-free). 

The basic principles of this argument are very general. Let us work with 
general formulas in "prenex" form, which means that all the quantifiers are 
on the outside. (Thus 3xyy[F[x) A G{y)) is prenex, [3xF[x)) A {\/yG{y)) 
is not. Formulas can always be put into prenex form.) Suppose that a 
sequent F — )• A consists only of prenex formulas and has a cut-free proof 
n in LK. The midsequent theorem [22, 23, 25] states that we can modify 
n to get a new proof IF which has the property that as soon as one of the 
quantifier rules is used in the proof, only quantifier rules and contraction 
rules can be used afterwards. One should think of a proof as being coded 
by a tree here, with different branches being independent of each other until 
the moment that they meet on the way to the endsequent. It is easy to see 
how the midsequent theorem is established. Because a cut-free proof enjoys 
the subformula property, all formulas that appear in it are in prenex form. 
Once a formula has a quantifier attached to it, one cannot use it actively in 
a logical rule that is not a quantifier rule. This makes it easy to rearrange 
the proof in order to use all the quantifier rules after the nonquantifier rules. 

The midsequent theorem shows that in principle the quantifier rules are 
not essential for predicate logic. In fact there is a general statement to the 
effect that tautologies in predicate logic can be converted into propositional 
tautologies without losing information. We gave a version of this in Theo- 
rem 2, but it is not as simple to accommodate alternation of universal and 
existential quantifiers. To handle the general case one uses function symbols 
which were not in the original language, called Skolem functions. These spe- 
cial functions code relationships between terms in the Herbrand disjunction 
which allow one to recover the statement in predicate language. See [25] for 
more information. 

Let us illustrate the meaning of Herbrand's theorem in ordinary mathe- 
matics with the following scenario. Suppose that we are interested in some- 
thing like the word problem in finitely presented groups (or semigroups, which 
are technically simpler for this discussion). We might then be interested in 
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a sequent of the form 

yxiRi{xi), . . .,\/xkRk{xk) B{t) 

where the formulas \/xiRi{xi) contain information from the relations of the 
group (in the form that an element of the group multiplied by a relation gives 
back the same element of the group), and where Bit) is a formula without 
quantifiers which contains the information that a given word, represented 
here term t, is trivial in the group. 

This sequent is in almost the same form as for the above formulation of 
Herbrand's theorem. That is, we can use the negation rules to convert the 
universal quantifiers on the left side into existential quantifiers on the right 
side. We have several formulas instead of just one, but they can be combined 
on the right with disjunctions. We have several quantifiers, but that does 
not really matter. 

If one can prove such a sequent, then it means that one can show that 
the word is trivial in the group using the given relations. Of course if one 
can prove this one should not use the relations more than a finite number of 
times, even though formulas of the form \/xiRi{xi) allow for infinitely many 
possible choices of Xi a priori. Herbrand's theorem is a general statement 
to this effect. Cut-elimination provides a procedure to make explicit the 
relations that are needed. Proofs with cuts can be shorter by not making 
explicit the way that the relations are used, or how often. This is the power 
of quantifiers; they are "infinite" objects which can enable one obtain "finite" 
conclusions much more quickly. 

We shall look more closely at an example of this type in Section 9, in 
connection with the problem of defining large numbers in arithmetic. Before 
we do that we discuss another example from analysis. 

8 The John-Nirenberg theorem 

The John-Nirenberg theorem is a result in real analysis whose proof provides 
an interesting example for cut elimination and the combinatorics of proofs. 

Let f{x) be a locally integrable function on the real line. Given an interval 
/ in R, write /) for the average of / over /, 

A{fJ) = ^J^f{y)dy, 

where |/| denotes the length of /. Define the "mean oscillation of / over /" 
by 

mo(/,/) = ^|j/(x)-A(/,/)|Jx. 
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We say that / has bounded mean oscillation (BMO) if 



11/11 



supmo(/, /) < oo. 



Bounded functions satisfy this property, but some unbounded functions do 
too, such as f{x) = log \x\. Note however that \x\" does not he in BMO for 
any a 7^ 0. 

Think of the restriction of / to an interval / as being a localized snapshot 
of /. The BMO condition provides a way to say that these snapshots all 
have bounded averages, except that we permit ourselves to subtract off the 
mean value kind of normalization. 

How big can a BMO function really be? A simple fact is that 



for all A > 0, where we write I^E"! for the Lebesgue measure of E. This is just 
Tchebychev's inequality. 



simply by definition. However, it turns out that for a BMO function we 
actually have exponential decay of the measure as a function of A, that is 



for all A > 0. This means that BMO functions are closer to being bounded 
than it might appear at first. This estimate is roughly the best possible, 
because the logarithm has exactly exponential decay on the interval [0,f], 
for instance. 

This exponential decay is a famous result of John and Nirenberg. See 
[2f, 56, 52]. It is very important that in the definition of BMO we take 
the supremum over all intervals. We certainly do not get exponential decay 
on an interval / simply from the knowledge that mo(/, /) is finite for that 
particular interval. 

This is the general setting of the John-Nirenberg theorem. Let us now 
sketch the proof. The main lemma comes from a construction of Calderon 
and Zygmund, and it says the following. Suppose that we have an interval J 
and a function h on J such that 





|{^ e / : |/(^) _ > A}| < 2-(^-^)/^ ll/IM/ 



ljh{y)\dy<l 
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Then we can find a subset E of J with the following three properties. First, 
E has at least half the points of J, 



Second, h is not too big on 

1^(^)1 ^ 2 for almost all x ^ E. 

The third property is more geometric. It says that J\E can be realized as 
the disjoint union of a collection of subintervals {Ji} of J, in such a way that 

^ / \Hy)\dy<i 

I 'J i I ^ Ji 

for each i. 

To understand what all of this means think about the first two properties 
and forget about the third for a moment. To get these properties we do not 
need any kind of special geometry, it is simply a question of measure theory. 
If < 1 on average, then we cannot have \h\ > | on more than half the 
points. 

This measure-theoretic argument says nothing about what happens on 
the set where \h\ > |. The point of the Calderon-Zygmund construction is 
to break up this "bad" set into pieces where the average is bounded again, 
as stated in the third property. We cannot say exactly what happens on the 
bad set, but we can organize it in a good way. 

Let us describe the main point of the Calderon-Zygmund construction. 
Start with J and break it into its two halves. On each of these ask the 
question "Is the average of larger than 2?" When the answer is yes put 
that interval aside. It will become one of the J^'s. When the answer is no 
take the given interval, cut it into halves, and ask the same question about 
the two new intervals. 

Repeating this process indefinitely we get a bunch of intervals on which 
the average was larger than 2. Outside these intervals \h\ < 2a.e. The 
intervals are disjoint, and one can show that the sum of their measures is 
< I using the assumption that the average of on J is < 1. The 
averages of on these "bad" intervals is < 4. This comes because we called 
an interval bad at the first moment that it was bad, it came from splitting 
an interval on which the average of \h\ was < 2. 

This is roughly how the Calderon-Zygmund construction works. Let us 
explain how one proves the John-Nirenberg theorem. Fix an interval /, and 
assume for simplicity that ||/||* < 1. We apply the Calderon-Zygmund con- 
struction to h = f — A{f, /) on /. We get a set on which \f — A{f, /)| < | 
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and a bunch of intervals {!{} on which the average of |/ — /)| is bounded 
by 4. The main point is to repeat the construction on each /i, apphed to 
the function / — /j). The assumption that ||/||* < 1 imphes that the 
average of \f — Ii) \ over li is bounded by 1. To relate what we get back 
to / we use the fact that the average of \f — /)| over each li is bounded 
by 4, whence 

|A(/,/0-A(/,/)|<4. 

We repeat the Calderon-Zygmund construction on each /i, for each one 
we get a new family of intervals, we apply the construction to each of those, 
and repeat the process indefinitely. One can make computations to derive 
the exponential decay mentioned earlier. 

For us the Calderon-Zygmund construction is the "lemma" that we are 
applying repeatedly. If we do not make cuts then we have to repeat the 
construction on each interval. Instead we can think of proving it once but 
using it many times, through contractions and cuts. In each application it 
is applied to different functions and intervals, and indeed it is applied to 
intervals that were produced in an earlier application of the lemma. 

We shall not try to make a precise formalization of the proof in logic, 
but it should be clear in principle how the elimination of cuts corresponds 
to performing the Calderon-Zygmund construction explicitly each time it is 
needed, and that with contractions and a cut one can make a smaller proof 
with a lot of "coiling" or "cycling" in the proof, coming from the output of 
the lemma going back into it as input. This idea can be made more precise 
through the logical fiow graph, which we discuss in Section 10. 

If one wants to treat the proof of the John-Nirenberg theorem more pre- 
cisely in terms of logic, then it might be better to work with finite versions 
of it. Instead of working with functions on the whole real line let us think 
of functions on the set of integers I, 2, 3,..., 2^, where N is a large positive 
integer. We still have a natural notion of intervals in this case, although for 
technical reasons it is better to restrict ourselves to intervals for which the 
number of elements inside is an integer power of 2. This makes it easier to 
cut them in half. Instead of Lebesgue measure we use counting measure, so 
that the integrals become sums. 

If we do this then we can get the same kind of inequalities as before. 
For the purposes of analysis the precise constants like 2 and 4 are not so 
important, what matters are uniform estimates, with constants which do not 
depend on N in this discrete model. 

In this discrete model one can formalize the John-Nirenberg inequality 
and its proof in fairly simple logical terms. One does not need anything like 
the full structure of the real line or measure theory, one could also restrict 
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oneself to functions which take values in the rationals to avoid any use of the 
real numbers. 

This discrete version of the John-Nirenberg theorem also contains all the 
information of the original. Notice that there is a small additional subtlety 
for the proof. In the real line all intervals are practically the same, one 
can change from one to another using an affine mapping. In the discrete 
model that is not quite the case, one cannot use dilations to identify intervals 
of different length. If one were to formalize a proof with cuts this might 
require different proofs of the Calderon-Zygmund construction for intervals 
of different lengths, rather than having one proof for all intervals. This could 
lead to a requirement for N different lemmas in the proof instead of just one, 
one lemma for each dyadic length 2^ . 

This expansion in the number of lemmas reflects an interesting differ- 
ence between "continuous" and "discrete" mathematics. There can be more 
symmetries in continuous mathematics which enable one to make shorter 
proofs. 

Notice the strong role of quantifiers in the proof of the John-Nirenberg 
theorem. The argument works because we have information about all inter- 
vals, as in the condition ||/||* < I. This is a common phenomenon in har- 
monic analysis, that properties of functions or sets are naturally expressed 
in terms of universal quantifiers which refiect crucial scale-invariance. In 
the proof we used universal quantifiers to avoid dealing with specific inter- 
vals. This lead to a kind of cycling in the argument. In the next section we 
describe a different example which is easier to analyze and which exhibits 
similar features of dynamics in short proofs with cuts. 

9 Large numbers 

We want to provide now a more precise example where quantifiers and the 
cut rule can be used to make short proofs, and where explicit proofs would 
need to be much longer. We do this in the context of arithmetic, through 
the concept of "feasible numbers" [46]. The logical fiow graph (Section 10) 
associated to the proof described below has a rich structure of cycles. 

To try to avoid overloading the reader with syntax let us not review the 
manner in which arithmetic is formalized in predicate logic. In short one 
specifies symbols to refiect the basic objects and operations in arithmetic 
(0,=,<,-|-, and * for multiplication), and one adds axioms to encode their 
basic properties. One of the main objects used in this formalization is the 
function s[x)^ the successor function, which is interpreted in normal math- 
ematics as X -\- 1. This is important for the formalization of mathematical 
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induction, which we shall not use here. 

We shall also "cheat" a little and add to our version of arithmetic the 
operation of exponentiation with the usual property that [z^Y = z^*^ . 
For the present purpose dealing with the exponential function at the level of 
first principles would be a distraction, but it is certainly possible to define it 
in the framework of Peano arithmetic and to establish its properties. 

Let us add a new object to arithmetic, namely the symbol F which repre- 
sents a unary relation of feasibility. The idea is that if x represents a natural 
number, then F[x) represents the statement that x can be constructed in 
some feasible manner. The properties of F can be described as follows: 

m 

F : equality x = y D {F{x) D F[y)) 
F : inequality F[x) A (y < x) D F[y) 
F : successor F[x) D F[s[x)) 
F:plus F{x) A F{y) D F{x + y) 

F : times F{x) A F{y) D F{x * y) 

To show that a number n is feasible one ought to be able to construct a proof 
of F[n). We do not allow induction to be used over formulas containing F. 
Otherwise we could prove \/xF{x) in a few steps. 

Note that we are not allowing an exponential rule for F . 

In mathematical logic one sometimes looks at these axioms together with 
-^F{6) for some term 6 without variables. This gives an inconsistent theory, 
but one can study "concrete" consistency, in which proofs with only a "small" 
number of formulas cannot prove inconsistency. 

To work within sequent calculus as before the preceding rules should be 
reformulated, but the point is clear enough and we omit the details. See also 
[12], [18]. 

Given a particular number n one can prove F{n) using the F : successor 
rule n times. Of course one can be more clever than that, using addition and 
multiplication to reduce substantially the length of the proof of the feasibility 
of F[n). It is easy to imagine how one can get down to the realm of log n, but 
on average one should not expect to do much better than that. We want to 
consider some particular large numbers for which one can find much shorter 
proofs of feasibility. 

Define 62(^,2) recursively for nonnegative integers n by 62(0,2) = 1, 
62(^-1- 1,2) = 2^2^"'^^ This function has nonelementary growth, i.e., it grows 
faster than any finite tower of exponentials. We want to describe a proof of 
F(e2(n,2)) with 0{n) lines due to Solovay. 

Fix an n for which we want to prove F(e2(n, 2)). Define auxiliary formulas 
Fi^ i = 0,1,2,... as follows. We take Fo[x) = F{x), and we define Fi[x) 
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recursively by 

F,{x) is (Vz)(F,_i(z) D 

Conceptually we can imagine each formula Fi[x) as defining a finite set of 
nonnegative integers. For each i these are the integers x such that the set 
corresponding to Fi^i is closed under the mapping z . We can think of 
these subsets as becoming "smaller" as i increases, and even in an exponential 
way. 

The main property of these formulas Fi[x) is that 

F,{x),F,{y)^F,{x*y) 

is provable. Let us first explain this in words. This sequent says that if 
Fi[x) and Fi[y) both hold, then Fi[x * y) holds too. The hypotheses imply 
that (Vz)(F,_i(z) D and that (Vu;)(F,_i(u;) D are true. 

Applying the second with w = z^ we get that Fi-i[z^) D Fi-i[[z^y) is true 
for all x, and hence Fi-i[z) D Fi-i[[z^y) is true because of the assumption 
on X. This is the same as Fi[x * j/), because [z^Y = z^*"^ . It is not hard to 
formalize this as a proof in sequent calculus, but we leave the details as an 
exercise. 

Using this property one can show that — )• -^8(2) is provable for each i. 
Next we claim that 



(1) i^o(2), Fi(2), . . . , F^{2) ^ Fo(e2(n + I, 2)) 
is provable. To see this we use the following building blocks: 

(2) F,(e2(n - z, 2)), F,_i(2) ^ F,_i(2^(n - z, 2)) 

These sequents are easy to prove using the definitions. They can be combined 
through cuts to give (I). 

We can now conclude that 



^Fo(e2(n + I,2)) 

is provable. This follows from (I) and the provability of — )• -^8(2) for all z, 
established before. It is not hard to see that our proof had a total of 0{n) 
lines. Of course this is the same as — )• F[e2{n + I, 2)). 

The nesting of quantifiers in this proof is responsible for the strong com- 
pression. Recall that Fi[x) is defined recursively by [yz)[Fi-i[z) D Fi-i[z^)). 
Thus the number of quantifiers in Fi[x) is I plus twice the number of quanti- 
fiers in Fi-i[z) and therefore grows exponentially in i. More importantly, in 
Fi[x) we have two occurrences of Fi^i which lie within the scope of the same 
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quantifier and wliicli are linked through the implication. By recursion there 
is a similar structure within Fi^i and so forth. This nesting of links is crucial 
for the compression and leads to rich dynamical structure. The links combine 
to form an exponential number of cycles within the logical flow graph of the 
proof. See [12]. 

There is a notion of curvatures associated to the nesting of links between 
formula occurrences. This notion is used in [13] to read a finitely presented 
group from the cycles coming from a proof. For the example above the 
distortion of a certain subgroup inside the group refiects the compression in 
the proof and the expansion associated to cut-elimination. 

10 A geometric view 

How can we really "see" what happens in the preceding examples? Consider 
the John-Nirenberg theorem. The Calderon-Zygmund lemma is applied re- 
peatedly to its own output to provide the existence of certain families of inter- 
vals without listing them explicitly. The proof of the feasibility of e2(n-|- 1, 2) 
describes the number implicitly without writing out all the multiplications. 
In each of these cases an actual computation modelled on the proofs would 
have to visit certain formulas over and over again. There is cycling inside 
the proof with cuts. 

To make this precise we use the concept of a logical flow graph introduced 
by Buss [8]. A different but related graph was introduced earlier by Girard 
[24]. Actually we shall use a modification of Buss' definition, in which we 
restrict ourselves to atomic formulas occurring in the proof. Atomic formulas 
are formulas without logical connectives. In propositional logic this means 
the propositional variables, while in predicate logic it means relations with 
their terms. The restriction to atomic formulas seems to be more natural for 
geometric and dynamical interpretations of proofs. Atomic formulas are like 
particles within a proof, and the logical fiow graph traces their motion. To 
see cycling it is important to work with atomic formulas, to be able to move 
freely up and down the proof. 

Let n be a proof of some sequent S. The vertices of the logical fiow 
graph are the occurrences of atomic formulas in 11. We connect occurrences 
of atomic formulas by an edge only when they are variants of each other. 
In propositional logic two occurrences are variants when they represent the 
same formula, and in predicate logic we allow the terms within to be different. 

We attach edges by tracing the logical relationships in a proof. Let us 
start with an axiom A, F — )• A, A. We do not attach any edges to the atomic 
subformulas in F and A. For the pair of distinguished occurrences of A we 
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attach an edge between each pair of their corresponding atomic formulas. 
For instance, if A is p V A q)^ then we attach edges between the two outer 
p's, the two inner p's, and the two q^s. 

Next we have to decide how to attach edges in accordance with the rules. 
For example, consider the binary rule V : /e/t. 



Fach atomic formula on the top has a counterpart in the bottom, and we 
attach an edge between the two occurrences. No other edges are attached. 
Note that each atomic subformula in the A on top is connected by an edge 
to its counterpart in the A V i? in the bottom, and similarly for B. 

The other logical rules are treated in practically the same manner as the 
V : left rule. The cut and contraction rules are treated differently. For the 



we attach edges between the atomic subformulas of the two copies of the cut 
formula A, and between the atomic subformulas of the F's, A's occurring in 
the upper sequents and their counterparts in the lower sequent. No other 
edges are attached, and in particular there are no edges from the cut formulas 
to the lower sequent. For a contraction. 



each atomic subformula in each of the two occurrences of A in the upper 
sequent is connected to its counterpart in the lower sequent. Again the atomic 
occurrences in F, A in the upper sequent are connected to their counterparts 
in the lower sequent, and no other edges are attached. This is the only 
situation in which we have vertices attached to more than two edges. 

Thus the cut and contraction rules are very different geometrically from 
logical rules. We shall not pursue this here, but there is much more to be 
said about this. 

This completes the description of the logical flow graph, except for one 
extra ingredient, an orientation. Let us define first the sign of a formula. 
This simply counts the number of negations involved. Thus in the formula 
pA(-'(j'V-'(-'r)), the p and r occur positively and the q occurs negatively. The 
logical connectives A,V,V, 3 do not affect the sign, but D can. An atomic 
formula occurs positively in A D i? if it occurs positively in B or negatively 
in A, and otherwise it occurs negatively. This is because of the implicit 



A V5,ri,2 ^ Ai,2 



cut rule 



ri^Ai,A A,r2^A2 
ri,2 — ^ Ai^2 



F ^ A,A,A 
F ^ A,A 



A,A,F ^ A 
A,F ^ A 
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negation hidden in D. Similarly, in a sequent F — )• A, an occurrence of an 
atomic formula C is said to be positive in the sequent if it occurs positively 
in A or negatively in F, and it is said to be negative otherwise. 

The logical flow graph has the nice property that any two variants con- 
nected by an edge have the same sign, except for axioms and cut rules, where 
we connect formulas of opposite sign. 

We can use the signs of atomic formulas to define a natural orientation 
for the logical fiow graph in the following manner. For an axiom we have 
edges from negative occurrences to positive occurrences. For logical rules 
and contractions edges between negative occurrences are oriented upwards 
and edges between positive occurrences are oriented downwards. In the cut 
rule we do the same except for the edges between the cut formulas, where 
the orientation goes from positive occurrences to negative occurrences. This 
orientation h natural interpretation in logic [11]. 

This defines the logical fiow graph associated to a proof as a directed 
graph. For the authors this kind of geometric picture is extremely helpful in 
trying to understand the structure of proofs. It provides a way to trace the 
logical relations between different occurrences of a formula in a sequent, and 
a more global alternative to the usual induction arguments. 

To understand the meaning of the logical fiow graph it is helpful to begin 
with some simple observations. Suppose that we have a proof 11 in proposi- 
tional logic in which the occurrences of a variable p come in distinct connected 
components of the graph. Then we can rename all of the occurrences of p in 
one component and still have a valid proof. 

Sometimes occurrences in a sequent have to be linked within any proof 
of it. For example, in the tautology p V ^p qM the two q^s must have 
the same name and therefore must be connected inside the proof. The two 
p's do not need to be linked. 

In complicated proofs one should expect many links between pairs of 
occurrences in the endsequent. For example, if all cut-free proofs 11 of a 
sequent S are of exponential size compared to the size of S , then there must 
be a pair of occurrences in S which are connected in the logical fiow graph 
by an exponential number of links. This is not hard to derive from the 
definitions. (One has to be careful about weak occurrences, but they can be 
treated as in [11].) Remember from Section 3 that the cut-free proofs of the 
finite versions of the pigeon-hole principle in propositional logic must have 
exponential size. 

From this we see that in some cases proofs must have cycles, i.e., paths 
in the logical fiow graph which come back to where they started. However 
the cycles obtained as above are unoriented. 

A basic observation (see [11]) is that proofs without cuts have no oriented 
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cycles. The reason lor this is that in a prool without cuts we cannot have 
edges Irom positive occurrences to negative occurrences, and an oriented 
cycle must contain such an edge. 

Prools without contractions cannot have oriented cycles either. See [11, 
13]. This is more amusing geometrically, one has to go lurther into the logical 
structure ol the prool. 

It is the combination ol cuts and contractions that can lead to cycles. 
The prool described in Section 9 has an exponential number ol cycles, as 
we mentioned before. Another example is given in [12] where cycles help to 
make a prool short, although less dramatically. 

In general, the compression ol prools appears to be closely related to the 
presence ol cyclic structures. This is reminiscent ol certain phenomena ol 
distortion in finitely presented groups. It is natural to try to use groups to 
represent dynamics within prools, and this is explored in [13]. 

There are many natural questions concerning the relationship between 
cycles and the lengths ol prools. For instance, in [13] the question is raised 
ol whether one can eliminate cyclic structures more etficiently than through 
cut elimination. 

II we are going to admit the idea ol geometric structures attached to prools 
(a prominent theme in the algebraic approach to structure ol prools ol Girard 
[28, 26, 27, 29]), then we should also think about mappings between spaces. 
This opens up a story too long for the present paper, but let us mention 
a basic point. The concept ol mappings in the context ol prools suggests 
that we look for a more general notion ol subprool which corresponds to 
embeddings. Inner proofs^ introduced in [11], provide a candidate for such a 
notion. Roughly speaking, an inner prool can spread throughout the original 
prool (unlike a subprool), but keeping only some ol the wires inside. 

Inner prools provide a notion ol localization in a prool. We can ask about 
the local and global aspects ol cut elimination. Given a prool 11 and a cut- 
Iree version 11', one would like to know whether inner prools in 11' correspond 
to inner prools in 11. This would be uselul for matters ol complexity, because 
it is easier to find inner prools inside cut-lree prools, but then one would like 
to compress them as much as 11 is compressed. The usual method ol cut- 
elimination does not work well for this endeavor, and an alternate method is 
introduced in [11] for propositional logic, where some conditions for making 
this transformation are given. 

A basic point is that the known procedures ol cut elimination either 
deform paths in the logical fiow graph without breaking them, or they split 
the paths. One cannot recognize which alternative occurs in a local way. The 
problem ol paths being split under cut elimination creates an obstruction for 
being able to transfer inner prools in general. The method ol [11] exploits 
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the extra "determinism" in the history of propositional formulas to preserve 
more of the structure of the proof. In this method the sphtting of paths is 
delayed until the elimination of atomic cuts. This can be accomplished by 
an explicit control of contractions. 

These issues are related to Craig interpolation in a nice way. Not through 
the usual formulation of finding an interpolating formula, but in terms of 
splitting the given proof H : A ^ B when given a truth assignment a to the 
common variables of A and B as mentioned at the end of Section 6. If one 
could always transfer inner proofs from the cut-free proof 11' back to 11, then 
one could get linear bounds for the split proofs for a given a. This would still 
not be enough to give an interpolating function (in the sense of Section 6) 
computable in polynomial time. All this remains open. 

At this stage one starts to ask oneself a lot of questions about structure 
and proofs. Everything goes on the table, nothing is forbidden. 
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